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Abstract 

Enterprises have put more and more emphasis on data analysis so as to obtain effective manage¬ 
ment advices. Managers and researchers are trying to dig out the major factors that lead to em¬ 
ployees’ promotion and resignation. Most previous analyses are based on questionnaire survey, 
which usually consists of a small fraction of samples and contains biases caused by psycholog¬ 
ical defense. In this paper, we successfully collect a data set consisting of all the employees’ 
work-related interactions (action network, AN for short) and online social connections (social 
network, SN for short) of a company, which inspires us to reveal the correlations between struc¬ 
tural features and employees’ career development, namely promotion and resignation. Through 
statistical analysis, we show that the structural features of both AN and SN are correlated and 
predictive to employees’ promotion and resignation, and the AN has higher correlation and pre¬ 
dictability. More specifically, the in-degree in AN is the most relevant indicator for promotion, 
while the k-shell index in AN and in-degree in SN are both very predictive to resignation. Our 
results provide a novel and actionable understanding of enterprise management and suggest that 
to enhance the interplays among employees, no matter work-related or social interplays, can 
largely improve the loyalty of employees. 


1. Introduction 

Employees are the core and soul of enterprises, and thus human resources departments are al¬ 
ways trying to get comprehensive understanding of employees and provide to managers valuable 
and effective suggestions. This issue has attracted much attention from interdisciplinary research 
domains. Some factors related to the employees’ performance have been revealed, such as cen¬ 
trality dll, self-monitoring orientation dj sex d, and communication patterns d. Besides 
the performance, predicting and controlling resignation in advance are also valuable, since em¬ 
ployees’ resignation may result in great losses for enterprises. Ereely and Barnett d showed that 
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people being highly connected and in more central positions of the employee networks are less 
likely to resign. Ten years later, this group put forward that employees who reported a greater 
number of out-degree links to friends are less likely to resign Similar indicators include 
degree, betweenness and closeness Hi- 

Traditional studies are based on the data sets gathered from questionnaire survey that is prob¬ 
ably subjective due to the psychological defense. Fortunately, recent IT development brings us 
more reliable information, such as email data H, communication records ifioll . online social 
relations d , contact networks built by wearable electronic sensors H, and so on. In this pa¬ 
per, we collect anonymous employees’ work-related interactions and social connections from a 
social network platform developed and used by a Chinese company consisting of more than a 
hundred employees, named Beijing Strong Union Technology Co. Ltd. (Strong Union for short). 
Accordingly, we can build two directed networks, action network (AN) and social network (SN), 
where nodes represent employees and links indicate the work-related interactions and social con¬ 
nections, respectively. A few studies have already shown that the employees bei^ more central 
in the networks or with more connections to others are less likely to resign la ul ll2l 131, 
while managers whose local networks are rich in structural holes could be promoted faster 1114 1. 
Here, we take more structural features into consideration to reveal the correlations between topol¬ 
ogy and career changes. In addition, we would like to show which network is more related to 
promotion or resignation. 

The two networks characterize relationships among employees from different aspects. AN 
reflects work-related interactions such as downloading working files and working blogs within 
working groups, while SN reflects social connections among all employees of the company, 
through which employees can share their family and entertainment activities (usually not related 
to works) with others. Via statistical analysis, we And that the structural features of both AN 
and SN are correlated and predictive to employees’ promotion and resignation, and the AN has 
higher correlation and predictability. More specifically, the in-degree in AN is the most rele¬ 
vant indicator for promotion, while the k-shell index in AN and in-degree in SN are both very 
predictive to resignation. 

This paper is organized as follows. Section 2 presents the description and basic statistics of 
the data set. The significance of selected structural features and logistic regression are respec¬ 
tively showed in Section 3 and Section 4. At last we give our conclusions and discussion in 
Section 5. 

2. Data Description 

The data set is gathered from a social network platform which involves all the 104 employ¬ 
ees in Strong Union until the end of 2013. It consists of two types of information, work-related 
actions and online social connections. The work-related actions of employees include reposting 
and commenting working blogs, assigning and reporting working tasks, and downloading work¬ 
ing files. The online social connections, which are similar to the follower-followee relationships 
in www.twitter.com, are created regardless of working groups, namely everybody can build re¬ 
lationships with others even they don’t belong to any common working groups. Accordingly, 
we can build two directed networks, action network (AN) and social network (SN), where nodes 
represent employees and links indicate the work-related interactions and social connections, re¬ 
spectively. More specifically, a link from u to v' in AN means the employee u has reposted, 
commented or downloaded v’s working files, and a link from u to v in SN means u has fol- 
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lowed V'. To the date of the data collection, 25 employees have resigned and 12 employees were 
promoted. 

Table [1] summarizes the basic topological features of the two networks. Compared with 
AN, SN is of higher density and average degree. The two networks are both of high clustering 
coefficients, quite small average shortest path lengths and very small modularity, since they are so 
dense that all the nodes are grouped together. Different from many known social networks Q, 
the two networks are both highly disassortative. It is because many work-related interactions and 
social connections happen between leaders and ordinary employees: the former are usually of 
large degrees while the latter are usually less connected. 


Table 1: The basic topological features of the two networks. N is the total number of nodes. D denotes the density of 
directed networks calculated by \E\/N{N — 1), where \E\ is the number of directed links. Other features are calculated by 
transferring the directed networks into undirected ones, (k) is the average degree, (d) is the average shortest path length, 
r is the assortative coefficient fT3l . and C is the clustering coefficient flol . H is the degree heterogeneity quantified by a 
variant of Gini index 0: The higher H is, the more heterogeneous the degree distribution is. Q the modularity of a 
network calculated by the Louvain method fl^ . 


Networks 

N 

D 

(k) 

(d) 

r 

C 

H 

Q 

AN 

97 

0.26 

35.73 

1.64 

-0.27 

0.76 

0.35 

0.09 

SN 

104 

0.29 

47.04 

1.55 

-0.41 

0.81 

0.32 

0.08 


3. Significance of Structural Features 


Previous studies suggested that some structural features of employees are effective to evaluate 
the resignation 3 . Here we select three well-known indicators in-degree (k/), out- 

degree (ko) and k-shell index (ks). Given a network G, k, is the number of links pointing to a node 
and ko is the number of a node’s outgoing links, ks of a node is the largest number such that this 
node belongs to the kj-core, which is the maximum subgraph of G in which all nodes have degree 
no less than ks lll9n . In the calculation of ks, the networks are treated as undirected ones. This 
index defines the employees’ positions in the network: the employees with higher ks are more 
central in the network, and vice versa. The k-shell index and its variants have been used to 
identify the influential spreaders in social networks Sim ms in. 

Because the career changes can be treated as binary variables, i.e. promotion vs. non¬ 
promotion and resignation vs. non-resignation, we adopt AUC (area under the receiver operating 
characteristic curve 1^1 ') value to measure the correlation between structural features and ca¬ 
reer changes. This metric can be interpreted as the probability that a randomly chosen promoted 
(non-resigned) employee has a higher value of kj, kg, or ks than a randomly chosen non-promoted 
(resigned) employee (see the definition and detailed explanation in EblU . In the implementation, 
we compare all the pairs of employees with opposite states (promotion vs. non-promotion and 
resignation vs. non-resignation) to calculate AUC. Suppose that among m times of comparisons 
in total, there are m' times the promoted (non-resigned) employee has higher value than the non- 
promoted (resigned) employee, and m” times they are equal, the AUC value can be calculated 
by 

AUC = 
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If the topological features are independent to the career changes, the AUC value should be about 
0.5. Therefore, the degree to which the value exceeds 0.5 indicates the strength of correlation. 

As indicated by the high AUC values in Fig. [1] promoted employees have obviously higher 
values for all the three structural metrics, while resigned employees have much lower values than 
others. These results strongly imply that the employees who are more central or highly connected 
are more likely to be promoted and less likely to resign. In addition, there are three secondary 
results: (i) AN is more related to both promotion and resignation than SN; (ii) in-degree in AN, 
namely the number of colleagues who have responded to the target employee’s working files, 
is the most critical indicator for promotion; (iii) out-degree in AN, indicating the number of 
employees the target employee follows, is the most relevant indicator for resignation: The more 
you follow, the less likely you quit. 
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Figure 1: (Color online) AUC values for promotion and resignation. Subfigure (a) represents the probability that a 
randomly chosen promoted employee has a higher value of k,-, ko or ks than a randomly chosen non-promoted employee. 
Subfigure (b) represents the probability that a randomly chosen resigned employee has a lower value of k,-, ko or ks than 
a randomly chosen non-resigned employee. 


4. Logistic Regression 


Since employees are only with two states in each case (promotion vs. non-promotion and 
resignation vs. non-resignation), we adopt logistic regression 12711 to identify the promoted em¬ 
ployees and resigned employees. It is a type of probabilistic statistical classification model, 
which is usually used to predict the binomial outcome of a response variable using one or more 
predictor features (e.g., the three indicators in this paper). The binary logistic regression model 
can be represented by a conditional probability 


F(l^) = 


1 


1 + g-ibo+I,'" biXi) ’ 


( 1 ) 


where x - (xi,..., x,„) is the vector of features, such as k,, kg and ks, and bo, b\, ..., h,„ are the 
coefficients we need to estimate based on the data. Here we use 1 to indicate promotion and 
resignation, and 0 to indicate non-promotion and non-resignation. 

By assigning bo, b\,..., b„, proper values, we can calculate the probability P(irx) through Eq. 
O for each employee. Then we are able to divide the employees into two groups according 
to a chosen cutoff probability (which is usually set to 0.5 by default). One group corresponds 
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to promotion (resignation), and the other corresponds non-promotion (non-resignation). We as¬ 
sume a model is better if more employees are identified correctly, namely, it is more capable to 
identify which employees are promoted (resigned) and which are non-promoted (non-resigned). 
To measure the accuracy of classification, we firstly introduce a basic metric precision, defined 
as 


precision - —, 
N 


where N is the total number of employees and n is the number of correctly classified employees. 
Nevertheless, only considering precision is not good enough in this case, because the numbers 
of both promoted employees and resigned employees are very small. Then we can get high 
precision by simply classifying all the employees into the non-promoted or non-resigned class 
(up to 88.5% for promotion and 76% for resignation), but no relevant employees (i.e., promoted 
employees and resigned employees) can be found out. Fortunately, recall, which is known as 
the sensitivity in binary classification, can compensate that inadequate measurement. The math¬ 
ematic form is defined as ^ 

recall — —, 

N' 

where n' is the number of relevant employee who is retrieved, and N' is the number of relevant 
employees who have resigned or have been promoted. 



precision 



Figure 2: (Color online) Precision-recall curve with various cutoff probabilities. Subfigure (a) represents the precision- 
recall curves for promotion, and subfigure (b) represents the cuiwes for resignation. AN represents action network, 
and SN represents social network, kj, ko and are the selected variables to classify which employees are promoted 
or resigned. The significance of chi-square test [28] for the constant and coefficient b\ according to the six models 
for promotion are (0, 0), (0.002, 0), (0.221, 0.208), (0, 0), (0.058, 0), and (0.987, 0.987) respectively. And those for 
resignation are (0, 0.755), (0, 0.164), (0, 0.022), (0, 0.021), (0.009, 0.054) and (0, 0.007) respectively. In general, the 
value below 0.05 shows a good fit. 


To figure out which feature is most related to promotion/resignation, in each run, we only 
choose one feature to form the vector^. The one with higher precision and recall is considered 
to be more related. Need to notice that, we do not put the three features in one model, because 
these three features are highly correlated with each other. And we do not compare the estimated 
coefficients for these features, because it will lead us inaccurate conclusions 12^ . Apparently, 
the values of precision and recall depend on the cutoff probability. So, if we vary the cutoff 
probability from 0 to 1, we can get a precision-recall curve. As shown in Fig. the most 
accurate classification can be found according to the right upper curve. More directly and clearly 
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in visualization, we introduce another metric named FI \ ^ 3j, | to compare the effectiveness by 
synthesizing precision and recall in the following way; 

2 X precision x recall 


FI = 


precision + recall 


It can be interpreted as a weighted average of the precision and recall. Obviously, an f 1 score 
reaches its best value at 1 and worst score at 0. The fl scores for individual features under 
different cut-ofF probabilities can be found in Fig. [3] 



cutoff probability 



Figure 3: (Color online) FI scores with various cutoff probabilities. Subfigure (a) represents the FI scores for promotion, 
and subfigure (b) represents the FI scores for resignation. AN means action network, and SN means social network, kj, 
ko and ks are the selected variables to classify which employees ai‘e promoted or resigned. 


From the peak values of the curves for promotion, we can see that k; is the most effective 
feature, while ko is the worst. In such small-scale networks, k, can reflect the employee’s direct 
influence since the leaders are more likely to attract more attention. Following this idea, ks 
should be also valid, because the higher ks an employee has, the more central position he stands. 
However, ks does not work so well. It may be because there are too many employees with the 
highest ks, i.e., 39 out of 97 employees with highest ks in AN, and 55 out of 104 employees with 
highest ks in SN. These employees own quite different values of k, ranging from 33 to 82 for AN, 
and from 37 to 68 for SN. So the employees are more distinguishable in terms of k, than that of 
ks- Compared with SN, every corresponding feature of AN is better. Further, let us check the 
significance of these estimated coefficients (see the caption of Fig. |2). Only k, in AN, kg in AN, 
and k, in SN can be accepted. So we can conclude that in-degree in the action network is the 
most effective and acceptable feature for predict promotion. 

In the case of resignation, although the maximal FI score belongs to k, in SN, the three 
features of AN are all very competitive. These results make sense because employees who get 
little attention (low k, in SN and AN), or stands at the periphery of the social network (low ks in 
SN) would feel worthless, under-appreciated and lonely. These negative feelings would results 
in resignation. In a word, employees who own many followers (high k,), stand at important 
positions (high ks), or are the hubs of communication (AN ko) are less likely to resign. When we 
check the significance of these estimated coefficients, only ks in AN, ks in SN and k, in SN are 
acceptable. So we conclude that ks in AN and k, in SN are better. 

Through the above analysis, we can conclude k, in AN is most related to promotion, while kj 
in AN and k, in SN are most related to resignation. 
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5. Conclusions and Discussion 


In this paper, we successfully collect a dataset consisting of all the employees’ work-related 
actions and online social connections from a social network platform developed and used by a 
company. To reveal the correlation between employees’ social/work-related actions and employ¬ 
ees’ career development, we implement some experiments based on the structural features of 
those two networks. The correlation analysis shows that the work-related interactions are more 
correlated to both promotion and resignation, which is also consistent with the classification re¬ 
sults through Logistic Regression analysis. We find the employee who can get more attention 
(higher in-degree) in the work-related network, is more probable to be promoted, while the em¬ 
ployees with low values of k-shell index in the work-related network and in-degree in the social 
network have high risk to resign. It makes sense that peripheral employees (with low k-shell 
value) are more likely to leave, because they may feel worthless, under-appreciated and lonely. 
In turn, we may think the central employees (with high k-shell value) will get higher probability 
to be promoted, but the classihcation experiments give negative results, which may be resulted 
from the fact that there are too many employees with the highest k-shell value. The results pro¬ 
vide a novel and actionable understanding of enterprise management and suggest that to enhance 
the interplays among employees, no matter work-related or social interplays, can largely improve 
the loyalty of employee. 

In these two employee networks, employees are so highly connected with each other that the 
networks are not obvious with modules. Maybe that is the nature of small-scale companies, but 
we can imagine that connections might be very sparse and the corresponding network might be 
organized with community structure in large-scale companies. Moreover, Hedstrom 11321] had in¬ 
dicated that the organization structures are usually different in different countries. So the current 
conclusions may be only valid in small-scale enterprises and cannot be directly extended to or¬ 
ganizations with different sizes, in different domains or belonging to different cultures. However, 
our conclusions are reasonable. We suggest human resources to pay more attention to employ¬ 
ees’ behaviors, e.g., the one who is socially isolated, and the companies also need to appreciate 
the ordinary employees who are in the central positions, because they are good organizers who 
can help senior leaders to unite the company. 

Furthermore, promotion and resignation are also affected by various reasons. For example, 
maybe the employees in the central position have already been the senior leaders who are re¬ 
stricted by “vacancy chains” 1^ . Some employees leave the company just because of the expiry 
of the employment contract. In despite of its complexity, we believe that the present work and 
further quantitative analyses on non-intervention data could provide novel insights in addition to 
those from questionnaire survey. We hope it could inspire further investigation on related issues 
to complement the current conclusions. 
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